Overview & Motivation

The existence of forests is essential for our life on Earth. By covering around 31 percent of the world’s total land area, forests provide a retreat and home to over 80 percent of land animals and countless partially even undiscovered plants. One can say that forests are the backbone of entire ecosystems. A significant part of the oxygen we breathe is provided by the trees, while they also absorb about 25 percent of greenhouse gases. Also economically we are dependent on forests as about 1.6 billion people around the world earn their livelihoods with forests. Furthermore, forests provide 40 percent of today’s global renewable energy supply, as much as solar, hydroelectric and wind power combined. Despite these utilities, forestation across the world has faced several challenges ranging from wildfire, human-driven deforestation, poor management and poor conversation in general. However, a loss of whole forests would mean severe consequences to humanity and life on Earth.

With this project we seek to answer important questions that address these challenges. We want to figure out the causes of destruction of forests, highlight their importance to our environment and predict trends around reforestation/deforestation. Moreover, we hope to show how we can tackle climate change by reforestation, in particular, how an increase in the forest’s area will help to increase the buffer of sustainability. For the statistics so far, see our reference (Opened on 07th of May, 2021).

Global Forest Development

This part of our project answers the following questions:

  • What was the global forest development over the last 30 years?
    • What are the trends? (globally)
  • Is there a correlation between air pollution and the amount of forest in a country / globally?
    • Which countries have the most air pollution?

Overview and preprocessing of the available data

The first data set is about the forest cover of the continents. Forest land of continents

First we preprocess the data set, we change the names of te columns we later need and make them more readable

  • The ‘Value’ column has been change to ForestSize to make it more understandable
  • The ‘Area’ column has been renamed as Continent to give more meaningful information
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Domain = col_character(),
##   Area = col_character(),
##   Element = col_character(),
##   Item = col_character(),
##   Year = col_double(),
##   Value = col_double()
## )
## # A tibble: 155 x 6
##    Domain      Continent Element Item         Year ForestSize
##    <chr>       <chr>     <chr>   <chr>       <dbl>      <dbl>
##  1 Forest Land Africa    Area    Forest land  1990    742801.
##  2 Forest Land Africa    Area    Forest land  1991    739526.
##  3 Forest Land Africa    Area    Forest land  1992    736251.
##  4 Forest Land Africa    Area    Forest land  1993    732976.
##  5 Forest Land Africa    Area    Forest land  1994    729700.
##  6 Forest Land Africa    Area    Forest land  1995    726425.
##  7 Forest Land Africa    Area    Forest land  1996    723150.
##  8 Forest Land Africa    Area    Forest land  1997    719875.
##  9 Forest Land Africa    Area    Forest land  1998    716599.
## 10 Forest Land Africa    Area    Forest land  1999    713324.
## # … with 145 more rows
## spec_tbl_df[,6] [155 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Domain    : chr [1:155] "Forest Land" "Forest Land" "Forest Land" "Forest Land" ...
##  $ Continent : chr [1:155] "Africa" "Africa" "Africa" "Africa" ...
##  $ Element   : chr [1:155] "Area" "Area" "Area" "Area" ...
##  $ Item      : chr [1:155] "Forest land" "Forest land" "Forest land" "Forest land" ...
##  $ Year      : num [1:155] 1990 1991 1992 1993 1994 ...
##  $ ForestSize: num [1:155] 742801 739526 736251 732976 729700 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Domain = col_character(),
##   ..   Area = col_character(),
##   ..   Element = col_character(),
##   ..   Item = col_character(),
##   ..   Year = col_double(),
##   ..   Value = col_double()
##   .. )
## [1] 155   6

Here we have pre-processed out data set

  • We have imputed the missing values by dropping them
## # A tibble: 155 x 4
## # Groups:   Continent, Year [155]
##    Continent  Year ForestSize AvgforestSize
##    <chr>     <dbl>      <dbl>         <dbl>
##  1 Africa     1990    742801.       742801.
##  2 Americas   1990   1728946.      1728946.
##  3 Asia       1990    569978.       569978.
##  4 Europe     1990   1009734.      1009734.
##  5 Oceania    1990    184974.       184974.
##  6 Africa     1991    739526.       739526.
##  7 Americas   1991   1723550.      1723550.
##  8 Asia       1991    570130.       570130.
##  9 Europe     1991   1010579.      1010579.
## 10 Oceania    1991    184810.       184810.
## # … with 145 more rows

Global Forest Trend

  • Plot showing the forest cover of the continents
  • the plot shows that the forest cover of the continents has decreased in the last 30 years
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.

### Findings: + Forest land has decreased globally

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Domain = col_character(),
##   Area = col_character(),
##   Element = col_character(),
##   Item = col_character(),
##   Year = col_double(),
##   Unit = col_character(),
##   Value = col_double()
## )
## Rows: 7,106
## Columns: 7
## $ Domain     <chr> "Forest Land", "Forest Land", "Forest Land", "Forest Land",…
## $ Country    <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
## $ Element    <chr> "Area", "Area", "Area", "Area", "Area", "Area", "Area", "Ar…
## $ Item       <chr> "Forest land", "Forest land", "Forest land", "Forest land",…
## $ Year       <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,…
## $ Unit       <chr> "1000 ha", "1000 ha", "1000 ha", "1000 ha", "1000 ha", "100…
## $ ForestArea <dbl> 1208.44, 1208.44, 1208.44, 1208.44, 1208.44, 1208.44, 1208.…

Top 20 countries with largest forest area

## Selecting by avg_forest_size

### Findings + Russian federation has the largest forest cover in the last 30 years

## `summarise()` has grouped output by 'Country'. You can override using the `.groups` argument.
## # A tibble: 7,106 x 3
## # Groups:   Country [242]
##    Country      Year forest_perYear
##    <chr>       <dbl>          <dbl>
##  1 Afghanistan  1990          0.607
##  2 Afghanistan  1991          0.607
##  3 Afghanistan  1992          0.607
##  4 Afghanistan  1993          0.606
##  5 Afghanistan  1994          0.606
##  6 Afghanistan  1995          0.606
##  7 Afghanistan  1996          0.605
##  8 Afghanistan  1997          0.605
##  9 Afghanistan  1998          0.605
## 10 Afghanistan  1999          0.605
## # … with 7,096 more rows

Countries With least forest area

## Selecting by avg_forest_size

### Findings + Curacao has show the least forest cover in the last 30 years

The air pollution data has air quality value in microgram cubic metre from the year 1990 to 2019. However, a few years data is not available. Air quality shows changes in the amount of pollution in the air.

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   COU = col_character(),
##   Country = col_character(),
##   SMALL_SUBNATIONAL_REGION = col_character(),
##   `Small subnational region` = col_character(),
##   LARGE_SUBNATIONAL_REGION = col_character(),
##   `Large subnational region` = col_character(),
##   VAR = col_character(),
##   Variable = col_character(),
##   YEA = col_double(),
##   Year = col_double(),
##   `Unit Code` = col_character(),
##   Unit = col_character(),
##   `PowerCode Code` = col_double(),
##   PowerCode = col_character(),
##   `Reference Period Code` = col_logical(),
##   `Reference Period` = col_logical(),
##   Value = col_double(),
##   `Flag Codes` = col_logical(),
##   Flags = col_logical()
## )
## Rows: 3,444
## Columns: 19
## $ COU                        <chr> "AUS", "AUS", "AUS", "AUS", "AUS", "AUS", "…
## $ Country                    <chr> "Australia", "Australia", "Australia", "Aus…
## $ SMALL_SUBNATIONAL_REGION   <chr> "TOTAL", "TOTAL", "TOTAL", "TOTAL", "TOTAL"…
## $ `Small subnational region` <chr> "Total", "Total", "Total", "Total", "Total"…
## $ LARGE_SUBNATIONAL_REGION   <chr> "TOTAL", "TOTAL", "TOTAL", "TOTAL", "TOTAL"…
## $ `Large subnational region` <chr> "Total", "Total", "Total", "Total", "Total"…
## $ VAR                        <chr> "PWM_EX", "PWM_EX", "PWM_EX", "PWM_EX", "PW…
## $ Variable                   <chr> "Mean population exposure to PM2.5", "Mean …
## $ YEA                        <dbl> 1990, 1995, 2000, 2005, 2010, 2011, 2012, 2…
## $ Year                       <dbl> 1990, 1995, 2000, 2005, 2010, 2011, 2012, 2…
## $ `Unit Code`                <chr> "MICRO_M3", "MICRO_M3", "MICRO_M3", "MICRO_…
## $ Unit                       <chr> "Micrograms per cubic metre", "Micrograms p…
## $ `PowerCode Code`           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ PowerCode                  <chr> "Units", "Units", "Units", "Units", "Units"…
## $ `Reference Period Code`    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `Reference Period`         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ AirPollution               <dbl> 7.60250, 7.49591, 7.36613, 6.90976, 6.78718…
## $ `Flag Codes`               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Flags                      <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## Rows: 3,444
## Columns: 3
## Groups: Country, Year [3,444]
## $ Country      <chr> "Australia", "Australia", "Australia", "Australia", "Aust…
## $ Year         <dbl> 1990, 1995, 2000, 2005, 2010, 2011, 2012, 2013, 2014, 201…
## $ AirPollution <dbl> 7.60250, 7.49591, 7.36613, 6.90976, 6.78718, 6.71166, 7.0…
## # A tibble: 14 x 3
##     Year TotalForestArea TotalAirPollution
##    <dbl>           <dbl>             <dbl>
##  1  1990        2668100.             4788.
##  2  1995        2690005.             5146.
##  3  2000        2646222.             5428.
##  4  2005        2606657.             5174.
##  5  2010        2570631.             5286.
##  6  2011        2564881.             5410.
##  7  2012        2586020.             5543.
##  8  2013        2580096.             5380.
##  9  2014        2574171.             5257.
## 10  2015        2568247.             5592.
## 11  2016        2562096.             5375.
## 12  2017        2555945.             5256.
## 13  2018        2549794.             5256.
## 14  2019        2543643.             5231.

Is there a correlation between Forest area and air pollution?

## `geom_smooth()` using formula 'y ~ x'

Findings: No linear relationship between Forest Area and Air Pollution

Correlation Value

## [1] "Kendall =  -0.032967032967033"

Findings: A negative correlation coefficient shows the variables are moving in the opposite direction.

Regression Analysis

fit model and estimate parameters

## parsnip model object
## 
## Fit time:  3ms 
## 
## Call:
## stats::lm(formula = ForestArea ~ AirPollution, data = data)
## 
## Coefficients:
##  (Intercept)  AirPollution  
##        15732          -116

Findings:

extrapolation

## # A tibble: 2 x 5
##   term         estimate std.error statistic  p.value
##   <chr>           <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    15732.    1541.      10.2  4.73e-24
## 2 AirPollution    -116.      48.7     -2.38 1.74e- 2

Findings: Very low p-value signifies that Forest Area and Air Pollution are independent of each other.

Residual Plot

## `geom_smooth()` using formula 'y ~ x'

Finding:

  • As our data set is non-linearly associated, the residual plot here assesses the appropriateness of our linear regression

  • In the plot the residual of data points are away from zero show the model is not a good fit

For more reliable results of the relationship between Forest Area and Air Pollution,

we implement Predictive Power Score

## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 2.8 observations in each test-set for the ForestArea-AirPollution relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 2.8 observations in each test-set for the AirPollution-ForestArea relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Warning in pal_name(palette, type): Unknown palette Green

Findings:

Predictive Power Score of 0 shows no relationship between Forest Area and Air Pollution

Top 10 countries with most air pollution

## Selecting by avgAirPollution

Findings:

India has the highest air pollution in the last 30 years

Least air pollution

## Selecting by avgAirPollution

Finding: Nauru has the least air pollution in the last 30 years.

Air pollution trend in India

Findings: * Air Pollution drastically increased after the year 2005. * It was highest in 2012. * It decreased in 2005 by ~10 Micrograms per cubic metre. * Sudden decrease around 2017. * Increased slightly in 2019 again

Air pollution trend in Nauru

Findings: * Air Pollution varies in range of 5 Micrograms per cubic metre and ~7 Micrograms per cubic metre. * Least in 2014

Deforestation

This part deals with the questions related to forest deforestation and environment:

  • Deforestation
    • Which countries are the main drivers of deforestation?
    • What are the trends? (by continents)
    • Prediction of how many years it would take until all forests are lost based on the current rate of change. *Relation to other environmental issues
    • Prediction of how much forest area has to be further increased to tackle the greenhouse gas emissions (what percentage of greenhouse gas decrease will occur if forest area increases).

Drivers of deforstation and forests in the future

Here the visualization of forest is shown country wise by hovering over it.

  • Information showed for each country
    • Percentage of forest lost in last three decades
    • Deforestation is last 30 years
    • Current forest area in (i unit =1000 hac)
    • Number of years in which complete forest area will be lost depending on the deforestation of specific country in last 3 decades.

Countries are being showed in different colors depending on the percentage of of forest lost in last three decades. countries with most percentage of forest lost are the main drivers of deforestation

note: for some countries deforestation data was shown as zero or not available , for them deforestation was found by finding the difference in forest area between 1990 and 2020

Deforestation treands by continents

Below Bargraph plot shows trends in the deforestation over the last three decades over 5 continents

note : here Americas refer to North and South America. while oceania refer to Australasia, Melanesia, Micronesia and Polynesia.

Required forests for Current Co2 emmission.

Below world map plot shows the amount of forest area required to tackle the current Co2 emissions country wise. As we had only Co2 emission data for every country from 1990-2018 , we have used Arima model time series prediction to predict future Co2 emission. for this purpose a function was created to predict future Co2 emission for every country.

By taking into consideration, The amount of metric tons Co2 absorbed by forest (that is one acre of new forest can sequester about 2.5 tons of carbon annually).the predicted Co2 value is used to find the amount of additional forest required.

Reforestation

In this part we want to dig in deeper into the topic of the reforestation by analyzing which countries are the main drivers, whether there is a correlation between reforestation and deforestation and showing the trends over continents.

Which countries are the main drivers of reforestation?

As these results show mainly countries with a huge surface, we want to put the increase of reforestation from 1990-2020 in relation to the forest area in 1990.

(source of idea https://plotly.com/r/choropleth-maps/#introduction-main-parameters-for-choropleth-outline-maps)

Is there a correlation between reforestation and deforestation?

First of all we want to show the relation of total reforestation and deforestation in the last 30 years, to get a first impression

As there are many outliers with either very high deforestation/reforestation figures, we zoom in by changing the scale.

The figure shows already a high and not linear distrubtion of our data.

With the Shapiro-Wilk test we want to show the normality of our data.

## 
##  Shapiro-Wilk normality test
## 
## data:  corref$totalref
## W = 0.25387, p-value < 0.00000000000000022
## 
##  Shapiro-Wilk normality test
## 
## data:  corref$totaldef
## W = 0.16293, p-value < 0.00000000000000022

The values are below 0.05 for both, reforestation and deforestation, the data significantly deviate from a normal distribution. A result which was already highlighted by the graph.

As the data is therefore not linear, we should choose the Spearman method to calculate the correlation.

## # A tibble: 2 x 3
##   term     totalref totaldef
##   <chr>       <dbl>    <dbl>
## 1 totalref   NA        0.206
## 2 totaldef    0.206   NA
## 
##  Pearson's product-moment correlation
## 
## data:  corref$totalref and corref$totaldef
## t = 3.2181, df = 234, p-value = 0.001473
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.08027866 0.32502440
## sample estimates:
##       cor 
## 0.2058687
## # A tibble: 2 x 3
##   term     totalref totaldef
##   <chr>       <dbl>    <dbl>
## 1 totalref   NA        0.451
## 2 totaldef    0.451   NA
## 
##  Spearman's rank correlation rho
## 
## data:  corref$totalref and corref$totaldef
## S = 1201838, p-value = 0.0000000000003009
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.4513834

With a value of 0.451 it shows a moderate correlation, which means, that deforestation has actually an impact on reforestation and a relationship exists.

Forest Destruction

The next part of our project is about forest destruction and tries to answer the following questions:

  • What were the main causes of forest destruction?
  • Which countries were affected the most?
  • How much forest was destroyed by each of these causes?
  • Is there a correlation between rising temperatures and wildfires?
  • Prediction of where and when wildfires are likely to occur.

Overview and Preprocessing of the available data

Our first data set is about forest disturbances (e.g. fire, weather) for the period 2000-2017:

FAO. 2020. Global Forest Resources Assessment 2020 - data set 2

## Rows: 4,248
## Columns: 14
## $ regions          <chr> "North and Central America", "North and Central Ameri…
## $ iso3             <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW…
## $ name             <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba",…
## $ year             <dbl> 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008,…
## $ boreal           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ temperate        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ tropical         <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100…
## $ subtropical      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `5a_insect`      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `5a_diseases`    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `5a_weather`     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `5a_other`       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `5b_fire_land`   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `5b_fire_forest` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

First we remove everything that we won’t need and make column names (of columns that we will later need) more readable:

  • The ‘regions’ column gets exchanged by a continent column which provides more intuitive information.
  • The ‘iso3’ column doesn’t provide any further information.
  • The name and continent column are changed to factor variables.

We also replace missing values with 0, because there is now safe way to impute these values. However this implies that our results for this part of the project are probably an underestimation.

## # A tibble: 4,248 x 13
##    name  continent  year Insects Diseases Weather Others  Fire boreal temperate
##    <fct> <fct>     <dbl>   <dbl>    <dbl>   <dbl>  <dbl> <dbl>  <dbl>     <dbl>
##  1 Aruba Americas   2000       0        0       0      0     0      0         0
##  2 Aruba Americas   2001       0        0       0      0     0      0         0
##  3 Aruba Americas   2002       0        0       0      0     0      0         0
##  4 Aruba Americas   2003       0        0       0      0     0      0         0
##  5 Aruba Americas   2004       0        0       0      0     0      0         0
##  6 Aruba Americas   2005       0        0       0      0     0      0         0
##  7 Aruba Americas   2006       0        0       0      0     0      0         0
##  8 Aruba Americas   2007       0        0       0      0     0      0         0
##  9 Aruba Americas   2008       0        0       0      0     0      0         0
## 10 Aruba Americas   2009       0        0       0      0     0      0         0
## # … with 4,238 more rows, and 3 more variables: tropical <dbl>,
## #   subtropical <dbl>, 5b_fire_land <dbl>
The second data set we use is about mean surface temperature changes for the period 1961-2020:

FAO. 2021. FAOSTAT Temperature Change Dataset

## Rows: 537,370
## Columns: 10
## $ `Area Code`    <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ Area           <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanist…
## $ `Months Code`  <dbl> 7001, 7001, 7001, 7001, 7001, 7001, 7001, 7001, 7001, 7…
## $ Months         <chr> "January", "January", "January", "January", "January", …
## $ `Element Code` <dbl> 7271, 7271, 7271, 7271, 7271, 7271, 7271, 7271, 7271, 7…
## $ Element        <chr> "Temperature change", "Temperature change", "Temperatur…
## $ `Year Code`    <dbl> 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1…
## $ Year           <dbl> 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1…
## $ Value          <dbl> 0.746, 0.009, 2.695, -5.277, 1.827, 3.629, -1.436, 0.38…
## $ Flag           <chr> "Fc", "Fc", "Fc", "Fc", "Fc", "Fc", "Fc", "Fc", "Fc", "…

Again we remove everything that we won’t need:

  • All ‘Code’-columns: Add no information.
  • All years that are not in 2000-2017: No forest data for these years.
  • The Unit column: it’s always °C.
  • All missing values (Flags != “Fc”) and afterwards the flag column.
## # A tibble: 151,621 x 5
##    Area        Months  Element             Year  Value
##    <chr>       <chr>   <chr>              <dbl>  <dbl>
##  1 Afghanistan January Temperature change  2000  1.60 
##  2 Afghanistan January Temperature change  2001 -0.569
##  3 Afghanistan January Temperature change  2002  1.64 
##  4 Afghanistan January Temperature change  2003  2.54 
##  5 Afghanistan January Temperature change  2004  2.74 
##  6 Afghanistan January Temperature change  2005  0.172
##  7 Afghanistan January Temperature change  2006 -1.49 
##  8 Afghanistan January Temperature change  2007  0.582
##  9 Afghanistan January Temperature change  2008 -5.40 
## 10 Afghanistan January Temperature change  2009  1.63 
## # … with 151,611 more rows

After preprocessing we can start answering the questions.
For the first two questions we decided to look at a global scale and for the country Germany.
For the third question we decided to look at a global scale and for the continent Europe.

Note: In our interactive shiny website you can pick the country / continent of your interest.

What were the main causes of forest destruction?

Findings:

  • On a global scale wildfires are clearly the dominant cause of forest destruction over the years.
  • However, this does not apply for every individual country:
    • Germany’s main cause for forest destruction are insects.
    • Note: the peak in this plot was caused by the heat wave in 2003.
  • There is no obvious trend to find over this small time scale.

How much forest was destroyed by each of these causes?

Findings:

  • On a global scale wildfires make up more than 50% of forest destruction, destroying more than 1 Billion ha of forest.
  • Insects make up almost 25% of forest destruction, destroying roughly 500 Million ha of forest.
  • These two are clearly the main drivers of global forest destruction.
  • Germany:
    • Insects and diseases are the main drivers of German forest destruction, destroying approximately 2.5 Million ha of forest over this 18 years time period.
    • Wildfires are no significant problem in Germany.

Which countries were affected the most?

Findings:

  • As one would expect from our previous findings, the most affected countries mostly struggle with wildfires.
  • The only exceptions to this are: USA, Canada, China, Sudan and Mexico.
  • Brazil is most affected country and has a huge wildfire problem.
    • Assumption: since Brazil is a tropical region and therefore very humid these wildfires are probably caused by humans.
  • Europe:
    • Except for Russia, Europe’s most affected countries have no problem with wildfires.

Is there a correlation between rising temperatures and wildfires?

Overview:

Our first visualization doesn’t suggest any linear correlation, but it can be improved to make the interpretation clearer.

Next we compare global yearly temperature changes to global forest area destructed by wildfires.

## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.

There is no linear correlation visible for both land and forest fires and rising temperatures.

Now we compare global yearly temperature changes to the global count of wildfires.

Again there is no linear correlation visible for both land and forest fires and rising temperatures. Maybe we can find a correlation if go more into detail and show not only global values but values for each country and each year:

We need a way to quantify these results. Since the data is clearly not linear we use the Predicitive Power Score [..] as a measure for correlation.

## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 3.6 observations in each test-set for the Wildfires-Temperature relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 3.6 observations in each test-set for the Temperature-Wildfires relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 3.6 observations in each test-set for the Wildfires-Temperature relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 3.6 observations in each test-set for the Temperature-Wildfires relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.

Findings:

  • There is clearly no correlation recognizable in our data between temperature increase and wildfires.

Prediction of where and when wildfires are likely to occur.

First we look at the available data and see which columns might help us to make a prediction.

## # A tibble: 151,621 x 5
##    Area        Months  Element             Year  Value
##    <chr>       <chr>   <chr>              <dbl>  <dbl>
##  1 Afghanistan January Temperature change  2000  1.60 
##  2 Afghanistan January Temperature change  2001 -0.569
##  3 Afghanistan January Temperature change  2002  1.64 
##  4 Afghanistan January Temperature change  2003  2.54 
##  5 Afghanistan January Temperature change  2004  2.74 
##  6 Afghanistan January Temperature change  2005  0.172
##  7 Afghanistan January Temperature change  2006 -1.49 
##  8 Afghanistan January Temperature change  2007  0.582
##  9 Afghanistan January Temperature change  2008 -5.40 
## 10 Afghanistan January Temperature change  2009  1.63 
## # … with 151,611 more rows
## # A tibble: 4,248 x 13
##    name  continent  year Insects Diseases Weather Others  Fire boreal temperate
##    <fct> <fct>     <dbl>   <dbl>    <dbl>   <dbl>  <dbl> <dbl>  <dbl>     <dbl>
##  1 Aruba Americas   2000       0        0       0      0     0      0         0
##  2 Aruba Americas   2001       0        0       0      0     0      0         0
##  3 Aruba Americas   2002       0        0       0      0     0      0         0
##  4 Aruba Americas   2003       0        0       0      0     0      0         0
##  5 Aruba Americas   2004       0        0       0      0     0      0         0
##  6 Aruba Americas   2005       0        0       0      0     0      0         0
##  7 Aruba Americas   2006       0        0       0      0     0      0         0
##  8 Aruba Americas   2007       0        0       0      0     0      0         0
##  9 Aruba Americas   2008       0        0       0      0     0      0         0
## 10 Aruba Americas   2009       0        0       0      0     0      0         0
## # … with 4,238 more rows, and 3 more variables: tropical <dbl>,
## #   subtropical <dbl>, 5b_fire_land <dbl>
## # A tibble: 3,809 x 14
##    name  continent  year Insects Diseases Weather Others wildfire boreal
##    <chr> <fct>     <dbl>   <dbl>    <dbl>   <dbl>  <dbl> <fct>     <dbl>
##  1 Aruba Americas   2003       0        0       0      0 No            0
##  2 Aruba Americas   2004       0        0       0      0 No            0
##  3 Aruba Americas   2005       0        0       0      0 No            0
##  4 Aruba Americas   2006       0        0       0      0 No            0
##  5 Aruba Americas   2007       0        0       0      0 No            0
##  6 Aruba Americas   2010       0        0       0      0 No            0
##  7 Aruba Americas   2011       0        0       0      0 No            0
##  8 Aruba Americas   2012       0        0       0      0 No            0
##  9 Aruba Americas   2013       0        0       0      0 No            0
## 10 Aruba Americas   2014       0        0       0      0 No            0
## # … with 3,799 more rows, and 5 more variables: temperate <dbl>,
## #   tropical <dbl>, subtropical <dbl>, 5b_fire_land <dbl>, temp_increase <dbl>

We got roughly 3800 samples for the prediction

## # A tibble: 2 x 6
##   .metric  .estimator  mean     n std_err .config             
##   <chr>    <chr>      <dbl> <int>   <dbl> <chr>               
## 1 accuracy binary     0.595     5 0.00994 Preprocessor1_Model1
## 2 roc_auc  binary     0.575     5 0.0130  Preprocessor1_Model1
## # A tibble: 2 x 6
##   .metric  .estimator  mean     n std_err .config             
##   <chr>    <chr>      <dbl> <int>   <dbl> <chr>               
## 1 accuracy binary     0.586     5 0.0141  Preprocessor1_Model1
## 2 roc_auc  binary     0.533     5 0.00705 Preprocessor1_Model1
## x Fold2: preprocessor 1/1, model 1/1 (predictions): Error in model.frame.default(...
## # A tibble: 2 x 6
##   .metric  .estimator  mean     n std_err .config             
##   <chr>    <chr>      <dbl> <int>   <dbl> <chr>               
## 1 accuracy binary     0.929     4 0.00474 Preprocessor1_Model1
## 2 roc_auc  binary     0.930     4 0.00549 Preprocessor1_Model1
## Warning: Cannot retrieve the data used to build the model (model.frame: Objekt 'wildfire' nicht gefunden).
## To silence this warning:
##     Call rpart.plot with roundint=FALSE,
##     or rebuild the rpart model with model=TRUE.

## # A tibble: 1 x 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.924
## Warning: Cannot retrieve the data used to build the model (model.frame: Objekt 'wildfire' nicht gefunden).
## To silence this warning:
##     Call rpart.plot with roundint=FALSE,
##     or rebuild the rpart model with model=TRUE.

## # A tibble: 1 x 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.736

Findings:

  • Our most accurate prediction depends only on the given country and is therefore not very useful in finding the underlying source of wildfires.
  • If we remove the country as a predictor, the main factor of deciding whether a wildfire occurs our not is the temperature, which is kind of a contradiction to our previous question.

Further questions

Correlation between greenhouse gas emissions and total carbon stored in forests

  1. Emission Dataset
  2. Carbon Stock Dataset
  3. Forest Area Dataset

Preprocessing

The first dataset containing data about the volume of carbon emitted per country per year (1990-2020) is processed here;

  1. Some columns were processed to make them consistent with others and for easy identification
  2. Missing values and and irrelevant columns were dropped
  3. The year column was made longer (pivot_longer).
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   Country = col_character(),
##   `Data source` = col_character(),
##   Sector = col_character(),
##   Gas = col_character(),
##   Unit = col_character(),
##   `1990` = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
## Warning: 20 parsing failures.
##  row  col expected actual                       file
## 1933 2010 a double    N/A 'historical_emissions.csv'
## 1933 2009 a double    N/A 'historical_emissions.csv'
## 1933 2008 a double    N/A 'historical_emissions.csv'
## 1933 2007 a double    N/A 'historical_emissions.csv'
## 1933 2006 a double    N/A 'historical_emissions.csv'
## .... .... ........ ...... ..........................
## See problems(...) for more details.
## Warning: NAs durch Umwandlung erzeugt

Analysis and Visualizations of GHG Emissions

  • Here we visualized the top 20 emitters of carbon globally, within defined period. The result is below.
## Selecting by avgEmission

  • As seen below, we sought to know the proportionate constituents of the green house gases; hence the plot below;

  • The average emission per country from 1990 - 2020 was also visualized below
## Warning in RColorBrewer::brewer.pal(10, "Reds"): n too large, allowed maximum for palette Reds is 9
## Returning the palette you asked for with that many colors

Carbon stock

This dataset captures the carbon stock per country for selected 9 years (1990,2000,2010,2015,2016,2017,2018,2019,2020)

Preprocessing

  1. Some columns were converted to a consistent data type
  2. Missing values were handled with the mice library
  3. Years not captured in the dataset were replaced with the mean carbon stock per country
  4. The unit of the carbon stock was tonnes/ha, where ha is the forest size in hectres
  5. Hence carbon stock was multiplied with forest area.
  6. Finally, the emission dataset was merged with the carbon stock to perform correlation.
## Warning: Number of logged events: 31
## New names:
## * `` -> ...12
## * `` -> ...13
## * `` -> ...14
## * `` -> ...15
## * `` -> ...16
## * ...
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Domain = col_character(),
##   Area = col_character(),
##   Element = col_character(),
##   Item = col_character(),
##   Year = col_double(),
##   Unit = col_character(),
##   Value = col_double()
## )

Analysis and Visualization of carbon stock

  • The trend of carbon stock globally from 1990 to 2018 was shown below

  • Comparatively, the trend of the emitted GHG from 1990 to 2018 was also shown

  • Comparatively, the trend of the emitted GHG from 1990 to 2018 was also shown

  • Here we visualized the countries with the largest carbon stock

## Selecting by avgCarbon

Correlation proper

We seek to answer the correlation question here, we start by comparing both variables per year.

  • Correlation plot
## `geom_smooth()` using formula 'y ~ x'

Correlation result

## [1] "Kendall =  -0.954415954415954"
## [1] "Spearman =  -0.992673992673993"
## [1] "Pearson =  -0.871214159804568"

Interpretation of the correlation coefficient; both variables are negatively correlated using all correlation methods. Most likely, if the emission increases, the carbon stock decreases. However, this relationship can not be ascertained using correlation because correlation does not necessarily imply causation.

Also, the absolute value of approximately 1.0 depicts perfectly linear correlation between both variables.

How much gas will be absorbed if forest area increases

Introduction

Here, the objective of our analysis is to investigate how much GHG will be absorbed if forest area is increased. To do this, we use the carbon stock and forest area datasets already loaded.

Analysis and Visualization

  • Comparison between absorbed carbon and forest area
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   country = col_character(),
##   `1990` = col_double(),
##   `2000` = col_double(),
##   `2010` = col_double(),
##   `2015` = col_double(),
##   `2016` = col_double(),
##   `2017` = col_double(),
##   `2018` = col_double(),
##   `2019` = col_double(),
##   `2020` = col_double()
## )
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   Domain = col_character(),
##   Area = col_character(),
##   Element = col_character(),
##   Item = col_character(),
##   Year = col_double(),
##   Unit = col_character(),
##   Value = col_double()
## )
## Warning: Number of logged events: 2
## `geom_smooth()` using formula 'y ~ x'

The result below was gotten after using linear regression

Result

## # A tibble: 2 x 5
##   term         estimate   std.error statistic  p.value
##   <chr>           <dbl>       <dbl>     <dbl>    <dbl>
## 1 (Intercept) 122.      1462.          0.0835 0.936   
## 2 totalArea     0.00250    0.000367    6.81   0.000252
  • Carbon_absorbed = 392 + 0.00243 * totalArea

  • For a unit increase in total forest area, the volume of carbon absorbed should increase by 0.00243

  • If there are no forests ie totalArea = 0, the volume of carbon absorbed will be 392.

  • Therefore, a unit increase in forest area, results in a decrease in the quantity of emitted CO2 by 0.00243

Summary

Ressources

PPS Score code from ex_6_codealong session